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Quick overview of Doom 3 


e GPLv3+ (with additional terms) on November 22, 2011. 


— Without “Carmack's Reverse” (aka depth fail) shadows. 
Date: Fri, 25 Nov 2011 01:32:56 +0200 


Subject: [PATCH v3 1/1] renderer: added support for Carmack's Reverse (depth 
fail) shadows. 


e OpenGL 1.x + OpenGL extensions. 
- Some of which are requirements. 

e X11 and GLX. 

e 8 years old. 

e ARB2 backend (best backend) 
— ARB_vertex_program && ARB fragment program. 
— Other backends available for older hardware. 


e ARB vertex buffer object used when available, otherwise 
fallback to virtual memorv. 
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Quick overview of Dante 


e OpenGL ES2.0 


- EGL 
- GLSL primary backend 


e ARB2 backend remains for debugging on the desktop; stubbed out when compiled 
for ES2.0 


e Carmack's Reverse (depth fail) added back 
- VBO requirement 
- ARBvp and ARBfp programs are not part of the GPLv3+ release 


e “Clean-room programs written in GLSL 
e Phong (rather than Blinn-Phong) shading model. 
- More computationally expensive but delivers much more realistic rendering. 
e Optional Half-Lambert lighting (see example on next slide: Phong + Half-Lambert.) 


e Support for Android... 
- You'd better have a high-end device! 
- “Some” bugs and missing features... 
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Lambert vs Half-Lambert 
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Half-Lambert Gone Wrong? 
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Optimization on Mesa 


e Unfortunately no really great tools for Mesa 
performance analysis... 
— 1965: intel gpu top: works like regular top' — 
e No support for per-frame analysis, 
e No support for pretty graphs (unless you're into ASCII art.) 
— Useful for rough estimate of GPU load. 


— Basically unusable output for game devs who haven't 
read and understood Intel HW docs. 


e Game devs tvpicallv don't want to read low-level HW docs... 


e So, What should we do to fix this for Mesa drivers? 
- Quick example of intel gpu_ top first... 


Oliver McFadden 


intel gpu top 


render busy: 37%: ######## render space: 69/131072 
bitstream busy: 0%: bitstream space: 0/131072 
blitter busy: 36%: ######## blitter space: 30/131072 
task percent busy 
GAM: 68%: ##HHHHHHHHHHHH vert fetch: 0 (0/sec) 
CS: 37%: #HHHHHHH prim fetch: 0 (0/sec) 
PSD: 32%: ###HHHH VS invocations: 33076780 (1617385/sec) 
DAP: 28%: ###### GS invocations: 0 (0/sec) 
RCPFE: 28%: ###### GS prims: 0 (0/sec) 
IZ: 28%: ###### CL invocations: 16538390 (808744/sec) 
RCPBE: 28%: ###### CL prims: 11324693 (689777/sec) 
RCC: 28%: ###### PS invocations: 11415570625 (347597257/sec) 
WMFE: 28%: ###### PS depth pass: 11132520857 (340957947/sec) 
EU 30: 26%: ###### 
SVG: 26%: ###### 
EU 10: 25%: ##### 
EU 00: 25%: ##### 
HIZ: 25%: ##### 
EU 20: 25%: ##### 
TD: 25%: ##### 
SVRW: 25%: ##### 
IC 3: 23%: ##### 
WMBE: 23%: ##### 
IC 2: 23%: ##### 
IC 0: 23%: ##### 
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IC 1: 23%: ##### 
EU 01: 22%: ##### 


Better debugging/analysis tools! 


e AMD's gDEBugger works on GNU/Linux, but only with 
AMD hardware and fglrx. 


— Older pre-AMD versions used to run with Mesa, but have 
problems with modern glibc. 


— Proprietary tool (both pre and post-AMD versions.) 
— Basically unusable for me... 


e Nvidia, SGX, ... have similar tools for their proprietary 
drivers. 


e We don't have any great tools for Mesa... 
— But we should! 
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Performance Graph 


Counter Name Value Scaled Value 


Frames/sec: Context 1 84 84 [1] 


€) geom_busy 0 [1] 


© OGL memory allocated 21594112 21 [1/1,048,576 (MB)] 


© 


© OGL memory allocated (vertex) 10555392 
© rop_busy 3 
© shader_busy 4 


+ Add performance counter... 


Frames/sec.. geom_busy... OGL memory... OGL memory... rop_busy... shader_bus... 
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Linux kernel and perf system... 


e https://perf.wiki.kernel.org 
— Stumbled across this by accident while looking at CPU profiling. 


perf provides rich generalized abstractions over hardware specific 
capabilities. Among others, it provides per task, per CPU and per- 
workload counters, sampling on top of these and source code event 
annotation. 


perf stat: obtain event counts 
- perf record: record events for later reporting 
— perf report: break down events by process, function, etc. 


perf annotate: annotate assembly or source code with event counts 


perf top: see live event count 
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Kernel perf system and Mesa 


Possibly create infrastructure in DRM and hook into perf sub-system? 
Needs some cooperation with userspace: 


— Mesa should indicate frame termination without causing a stall, e.g. 
e OUT BATCH(SCRATCH REG 0, 0OxDEADDOOD); 


— Could be done at swap buffers or more intelligentiv with the 
GL_GREMEDY frame terminator extension (with application support.) 


Userspace debugger could read the data from kernel and generate prettv 
graphs, suggestions, etc. 


- Interactive GUI, 
— HTML report, 
— ASCII art. ; -) 


Very much hand waving at this point. No prototype implementation. 
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Mesa debug output 


e Mesa drivers may be able to provide “hints” for the OpenGL application: 


if (ctx->Scissor.Enabled) 
perf debug("Failed to fast clear depth due to scissor being enabled. 
Possible 5%% performance win if avoided.\n"); 
e 20 dwords to change surface state (disable the scissor test.) 
— How to synchronize these with the data from kernel ‘perf system? 


— Possibly with a carefully managed frame counter? 


e Userspace debugger could match frame counter of data fetched from perf 
system and strings fetched from ARB debug output. 


— Currently perf debug() does not output to ARB debug _output! 


e ARB debug _ output works as long as the debugger and OpenGL application are 
in the same context... 
— But we probably do not want such a solution; it's ugly and we lose any benefits of having 
the debugger as a separate process. 
— Not quite sure how to handle Mesa debug output with the debugger in a separate 
process... Suggestions? 
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GLX vs EGL 


e Dante (OpenGL ES2.0, X11 (XCreateWindow et al), EGL): 


- +timedemo demo1 
— vblank_mode=0 


2148 frames rendered in 64.6 seconds = 33.3 fps 


MessageBox: Time Demo Results - 2148 frames rendered in 64.6 seconds = 33.3 fps 


e Dante (OpenGL ES2.0, X11 (XCreateWindow et al), GLX): 


- +timedemo demo1 
— vbank_mode=0 


2148 frames rendered in 47.2 seconds = 45.5 fps 


MessageBox: Time Demo Results - 2148 frames rendered in 47.2 seconds = 45.5 fps 


e Mesa appears to ignore vblank mode in the EGL code... 


src/egl/drivers/dri2/platform x11. 
dri2 egl surface(surf); 


src/egl/drivers/dri2/platform x11. 
src/egl/drivers/dri2/platform x11. 
src/egl/drivers/dri2/platform xi1. 
src/egl/drivers/dri2/platform x11. 
src/egl/drivers/dri2/platform x11. 
src/egl/drivers/dri2/platform x11. 


C- struct dri2 egl surface *dri2 surf = 


c-#endif 

c- 

Cc: /* XXX Check vblank_mode here? */ 

c- 

C- if (interval > surf->Config->MaxSwapInterval) 
C- interval = surf->Config->MaxSwapInterval; 
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Conclusion 


e Bottom line: We need better performance analysis tools. 


e Intel has done work on Mesa/i965 optimization with 
Valve Software for their “Left 4 Dead 2” game: 


— Eric Anholt, lan Romanick, and Ken Graunke at Valve's 
headquarters in person. 


— Possible for a large game development studio, 
- Not possible for indie game developers. 


e Performance tools will never be as good as experts in 
person, but can still be very useful. 
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Questions? / Comments”? 
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